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Abstract. 

We consider the problem of solving a linear system Ax = b when A is nearly symmetric and when 
the system is preconditioned by a symmetric positive definite matrix M . In the symmetric case, one 
can recover symmetry by using M-inner products in the conjugate gradient (CG) algorithm. This 
idea can also be used in the nonsymmetric case, and near symmetry can be preserved similarly. Like 
CG, the new algorithms are mathematically equivalent to split preconditioning, but do not require 
M to be factored. Better robustness in a specific sense can also be observed. When combined with 
truncated versions of iterative methods, tests show that this is more effective than the common practice 
of forfeiting near-symmetry altogether. 
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Abstract 

We consider the problem of solving a linear system Ax = b when A is nearly sym- 
metric and when the system is preconditioned by a symmetric positive definite matrix 
M . In the symmetric case, one can recover symmetry by using M - inner products in the 
conjugate gradient (CG) algorithm. This idea can also be used in the nonsymmetric 
case, and near symmetry can be preserved similarly. Like CG, the new algorithms are 
mathematically equivalent to split preconditioning, but do not require M to be fac- 
tored. Better robustness in a specific sense can also be observed. When combined with 
truncated versions of iterative methods, tests show that this is more effective than the 
common practice of forfeiting near-symmetry altogether. 


1 Introduction 

Consider the solution of the linear system 


Ax = b (1) 
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by a preconditioned Krylov subspace method. Assume at first that A is symmetric positive 
definite (SPD) and let M be an SPD matrix that is a preconditioner for the matrix A. Then 
one possibility is to solve either the left-preconditioned system 

M~ 1 Ax = M~ 1 b (2) 

or the right-preconditioned system 

AM~ l u = 6, x = M~ l u . (3) 

Both of these systems have lost their symmetry. A remedy exists when the preconditioner 
M is available in factored form, e.g., as an incomplete Choleski factorization, 

M = LL T 


in which case a simple way to preserve symmetry is to ‘split’ the preconditioner between left 
and right, i.e., we could solve 


L 1 AL T u = L l b, x — L~ T u , (4) 

which involves a symmetric positive definite matrix. This can also be done when M can 
be factored as M — M 1 / 2 x M 1 / 2 . Unfortunately, the requirement that M be available in 
factored forms is often too stringent. 

However, this remedy is not required. As is well-known, we can preserve symmetry by 
using a different inner product. Specifically, we observe that M~ l A is self-adjoint for the 
M-inner product, 

(x,y) M = (Mx,y) = (x,My) 

since we have 

(M~ l Ax,y) M = (Ax,y) = {x,Ay) = (x, M(M~ 1 A)y) = {x,M~ l Ay) M . 

If we rewrite the CG-algorithm for this new inner product, denoting by r 3 =b — Ax 3 the 
original residual and by Zj — M~ l rj the residual for the preconditioned system, we would 
obtain the following algorithm, see, e.g., [4], 


Algorithm 1.1 Preconditioned Conjugate Gradient 

1. Start: Compute r 0 = b - Ax 0 ; z Q = M~ l r Q ; and p 0 = z 0 . 

2. Iterate: For j = 0, 1, . . . , until convergence do, 

(a) a 3 = (r J ,z J )/(Ap J ,p J ) 

(b) x J+1 = x 3 + otjPj 

(c) r J+l = Tj - a 3 A Pj 
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(d) z J+1 = M~ l r j+1 

(^) 1^3 (^j + l > ^j + 1 )/(^j 5 Zj ) 

(0 P,+l = 3+1 + 0jPj 

Note that even though M-inner products are used, we do not need to multiply by M which 
may not be available explicitly; we only need to solve linear systems with the matrix co- 
efficient M. It can also be seen that with a change of variables, the iterates produced by 
this algorithm are identical to both those of CG with split preconditioning, and CG with 
right-preconditioning using the M ~ 1 -inner product. All three of these algorithms are thus 
equivalent. 

The question we now raise is the following. When A is only nearly symmetric, it is 
often the case that there exists a preconditioner M which is SPD. In this situation, the use 
of either of the forms (2) or (3) is just as unsatisfactory as in the fully symmetric case. 
Indeed, whatever degree of symmetry was available in A is now entirely lost. Although 
the above remedy based on M-inner products is always used in the symmetric case, it is 
rather surprising that this problem is seldom ever mentioned in the literature for the nearly 
symmetric case. In the nonsymmetric case, when M exists in factored form, some form of 
balancing can also be achieved by splitting the preconditioner from left and right. However, 
there does not seem to have been much work done in exploiting M- inner products even when 
M is not available in factored form. This dichotomy between the treatment of the symmetric 
and the nonsymmetric cases is what motivated this study. 

Ashby et. al. have fully considered the case of using alternate inner products when the 
matrix A is symmetric. Work of which we are aware that consider the use of alternate inner 
products when A is near-symmetric are Young and Jea [9] and Meyer [5]. In the latter, the 
A T M~ 1 A inner product (with M SPD) is used with ORTHOMIN and ORTHODIR. 

This paper is organized as follows. In Section 2, it is shown how alternative inner products 
may be used to preserve symmetry in GMRES. Section 3 considers the use of truncated 
iterative methods when the preconditioned system is close to being symmetric. This has 
been hypothesized by many authors, for example, Axelsson [2] and Meyer [5]. In Section 4 we 
consider the symmetrically preconditioned Bi-CG algorithm. Section 5 tests these algorithms 
numerically using a Navier-Stokes problem parameterized by the Reynolds number, and thus 
nearness to symmetry. We conclude this paper in Section 6. 


2 Symmetric preconditioning in GMRES 

When A is nearly symmetric, split preconditioning may be used to preserve the original 
degree of symmetry. Alternatively, left-preconditioning with the M-inner product, or right- 
preconditioning with the M _1 -inner product may be used. These latter two alternatives 
will be developed for the Arnoldi process, and used as the basis of ‘symmetric’ precondi- 
tioned versions of GMRES. Like CG, it will be shown that these symmetric versions are 
mathematically equivalent to split preconditioning, but do not require the preconditioner to 
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be symmetrically factored. We begin by exploring the options and implementation issues 
associated with left symmetric preconditioning. 

2.1 Left symmetric preconditioning 

The GMRES algorithm is based on the Arnoldi process. Without preconditioning, the 
Arnoldi algorithm based on the classical Gram-Schmidt process is as follows. 

Algorithm 2.1 Arnoldi-Classical Gram-Schmidt 

1. Choose a vector v 1 of norm 1. 

2. For j = 1 , 2, . . . , m do: 

3. z = Avj 

4. Compute h tJ — (z, u t ) for i = 1, 2, . . . , j 

5. z = z- E- =1 h ijVi 

6 - Vij = IMIs 

7. If hj + ij = 0 then Stop. 

8. v j+i z /hj+i,j 

9. EndDo 

Consider here the case when A is left-preconditioned, i.e., the matrix A involved in the 
algorithm is to be replaced by B = M~ l A, where M is some preconditioner, which we 
assume to be SPD. We wish to define a procedure to implement the above algorithm for the 
matrix B — M~ l A , using M-inner products, and if possible, avoid additional matrix- vector 
products, e.g., with M . Once this is accomplished, we will define the corresponding GMRES 
procedure. 

In the preconditioned case, it is customary to define the intermediate vectors in the 


product z — M~ 1 Av J as 


II 

‘ 

9 

(5) 

which is then preconditioned to get 


7 

5 

II 

fT 

(6) 


We now reformulate the operations of the above algorithm for B — M~ l A, with the M-inner 
product. Only the computation of the inner products changes. In the classical Gram-Schmidt 
formulation, we would first compute the scalars h- in Line 4 of Algorithm 2.1, 

hij - (zj,Vi) M = (Mz jy v { ) = (wjyVi) , i = l,...,j. (7) 

Then we would modify the vector z 3 to obtain the next Arnoldi vector (before normalization), 

j 

*3 - z j - E M 

i - 1 
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To complete the orthonormalization step we must normalize the final z y Because of the 
Af-orthogonality of Zj versus all previous u t ’s we observe that 

( Zj ■> Z j ) M ( Zj •> Zj ) M ( Af Wj i ( W j , Z j ) . ( 9 ) 

Thus, the desired Af-norm can be computed according to (9) and then computing 

h j+ hj = (zj> w j) 1/2 and v j + 1 = Zj/hj+ij- (10) 

One potentially serious difficulty with the above procedure is that the inner product 
( Zj,Zj) M as computed by (9) may be negative in the presence of round-off. There are two 
remedies. First, we can compute this Af-norm explicitly at the expense of an additional 
matrix- vector multiplication with Af, i.e., from 

As was pointed out earlier, this is undesirable, since the operator Af is often not available 
explicitly. Indeed in many cases, only the preconditioning operation Af -1 is available from a 
sequence of operations, as is the case for multigrid preconditioning. Another difficulty with 
computing h tJ with (7) is that it is not immediately amenable to a modified Gram-Schmidt 
implementation. Indeed, consider the first step of a hypothetical modified Gram-Schmidt 
step, which consists of Af-orthonormalizing z against iq, 

Ki = , z = z- h a v l . 

As was observed, the inner product (z^v^xf is equal to (uqtq) which is computable. Now 
we need Mz to compute (z — h il v 1 ^v i ) M in the modified Gram-Schmidt process. However, 
no such vector is available, and we can only compute the (z,v { ) M of classical Gram-Schmidt. 

An alternative is to save the set of vectors Mv { (again, not computed by multiplying by 
M ) which would allow us to accumulate inexpensively both the vector Zj and the vector w 3 
via the relation 

j 

w 3 = Mz 3 = Wj — ^2 h l] Mv l , 
i= 1 

which is obtained from (8). Now the inner product (zj^Zj)j^ is given by 

(zjJj)m = . 

In this form, this inner product is guaranteed to be nonnegative as desired. This leads to 
the following algorithm. 


Algorithm 2.2 Arnoldi-Classical Gram-Schmidt and M -inner products 

1. Choose a vector w 1 such that v 1 = M~ 1 w l has M-norm 1. 

2. For j = 1, 2, . . . , m do: 

3. w = Avj 
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4. 

5. 

6 . 

7. 

8 . 

9. 

10 . 


Compute h i; j = ( w , v t ) for i = 1, 2, . . . , j 

w = w- ELi h H w i 

z — M~ l w . 

hj +1 j = (z,^) 1 / 2 . If h J+l j = 0 then Stop. 
w i+i = ™/ h j + ij 
v i+i 
End Do 


As is noted, the above algorithm requires that we save two sets of vectors: the ?; J 1 s and 
the Wi s. The u t ’s form the needed Arnoldi basis, and the rc/s are required when computing 
the vector w 3 in Line 5. If we do save these two sets of vectors we can also now easily 
formulate the algorithm with the modified Gram- Schmidt version of the Arnoldi procedure. 


Algorithm 2.3 Arnoldi-Modified Gram-Schmidt and M-inner products 

1. Choose a vector w 1 such that tq = M^Wi has M-norm 1. 

2. For j = 1 , 2, . . . , m do: 

3. w = Avj 

4. For i = 1, . . . , j do: 

5. h i3 = (w,Vi) 

6. w — w — h-w, 

7. EndDo 

8. z — M~ l w 

9. = ( 2 , 1 c) 1 / 2 . If h 3+l j = 0 then Stop. 

10. w J+l = w/h j+1J 

11. v J+I = z/h J+liJ 

12. EndDo 


2.2 Right symmetric preconditioning 

The matrix AM -1 is self-adjoint with the M _1 -inner product. The situation for right- 
preconditioning with this inner product is much simpler, mainly because M~ l z is available 
when 2 needs to be normalized in the M~ l norm. However, M~ l z is normally computed at 
the next iteration in the standard Arnoldi algorithm; a slight reorganization of the Arnoldi- 
Modified Gram-Schmidt algorithm yields the following. 


Algorithm 2.4 Arnoldi-Modified Gram-Schmidt and M~ l -inner products 

1. Choose a vector v 1 of M~ l -norm 1 , compute w 1 = M~ 1 v l 

2. For j = 1, 2, . . . , m do: 

3. z = Aw 3 
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For i — 1 , . . . , j do: 
h%j = (z,Wi) 
z - z - h {j v t 
EndDo 
w — M~ l z 

h J+ lt j = ( z,w) l l 2 . If hj +1 j = 0 then Stop. 

Wj+l = w/Vij 
u j+i z /hj+i,j 
EndDo 

Note that the preconditioned vector is computed in Line 8, while in the standard algo- 
rithm it is computed before Line 3. Again, both the ids and the ltd s need to be saved, where 
Wj = M _1 Vj in this case. 

The additional storage of the itds, however, makes this algorithm naturally ‘flexible,’ i.e., 
it accommodates the situation where M varies at each step as when M~ l v is the result 
of some unspecified computation. If M~ l is not a constant operator, then a basis for the 
right-preconditioned Krylov subspace cannot be constructed from the ids alone. However, 
the vectors w- = M ~ 1 v 3 do form a basis for this subspace, where M~ l denotes the precon- 
ditioning operation at the j-th step. The use of this extra set of vectors is exactly how the 
standard flexible variant of GMRES is implemented [6]. 


4. 

5. 

6 . 

7. 

8 . 

9. 

10. 
11 . 
12 . 


2.3 Using M-inner products in GMRES 

The vectors v t form an orthonormal basis of the Krylov subspace. In the following we 
denote by V m = [fj, . . . , v m \ the matrix whose column vectors are the vectors produced by 
the Arnoldi-Modified Gram-Schmidt algorithm with M-inner products (Algorithm 2.3). A 
similar notation will be used for the matrix W m . We also denote by H m the (m-f 1) x m upper 
Hessenberg matrix whose nonzero entries h tJ are defined by the algorithm. H m denotes the 
top m x m portion of H m . These matrices satisfy a number of relations similar to the ones 
obtained from using the standard Euclidean inner product. 

Proposition 2.1 The following properties are satisfied by the vectors v i and w i in Algorithm 
2.3: 

1. M~ l AV m = V m+l H m , 

2. AV m = W m+1 H m , 

3. VTMV m = I, 

4. W T M-'W m = I, 

5. If A is Hermitian then H m is Hermitian and tridiagonal. 
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Consider now the implementation of a GMRES procedure based on the orthogonalization 
process of Algorithm 2.3. Since we are using Af-inner products we should be able to minimize 
the M- norm of the residual vectors M~ l b — M~ x Ax over all vectors of the affine subspace 
x 0 + K m in which, 

K m = span{z 0 ,M- 1 Az 0 ,...,(M- 1 A) m - 1 z 0 } (11) 

where z 0 = M~ 1 r 0 , and r 0 = b— Ax 0 . Define (3 — ||r 0 ||M and e 1 as the first coordinate vector. 
Then we have, 

b~Ax = b- A(x 0 AV m y) 

= r 0 - AV m y 
r o fCm + 1 y 

= Wm+i(/?ei - H m y). 

Therefore we have the equality, 

|| M-'(b-AxW„ = \\M-'W m+1 (0e, - H m yW M 

= (M-W m+1 (/Je, - H m y), W m+1 (/Je, - H m y )) 

= (^ +1 M-W m+1 (/Je, - H m y),(0e , - H m y)) 

= We, - H m yf 2 . (12) 

A result of the equality (12) is that we can minimize the M-norm of the (preconditioned) 
residual vector M~ 1 (b— Ax) by simply minimizing the 2-norm of (3e x — H m y as in the standard 
GMRES algorithm. 

Algorithm 2.5 Left-preconditioned GMRES with M-inner products 

0. Compute r 0 = b — Ax 0 and z = M~ 1 r 0 

1. Compute (3 — (r 0 , z) 1 / 2 ; — z/f3 and w l = r 0 / (3 

2. For j — 1 , 2, . . . , m do: 

3. w = Avj 

4. For i = 1, . . . , j do: 

5. h ij ^{w,v i ) 

6. w — w — h tJ w l 

7. End Do 

8. z = M~ l w . 

9. h J+ ij = (z,w) l l 2 . If hj + ij = 0 then Stop. 

10. U) J+1 = w/h j+ Jj 

11. v H1 = z/h j+ u 

12. EndDo 

13. Compute the minimizer y m of \\(3e 1 — H m y || 2 

14. Compute the approximate solution x m — x 0 + V m y m 

15. If satisfied Stop; else set x 0 — x m and goto 1. 
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An equality similar to (12) can be shown for the right-preconditioned case with M~ l - 
inner products. We can summarize with the following theorem, which we state without 
proof. 


Theorem 2.1 The approximate solution x m obtained from the left-preconditioned GMRES 
algorithm with M-inner products minimizes the residual M-norm ||M” 1 (6— Aa;)||jv/ over all 
vectors of the affine subspace x 0 + K m in which 

Km = span{z 0 ,M- 1 Az 0 ,...,(M- 1 A) rn ~ 1 z 0 } (13) 

where z 0 = M~ l r 0 . Also, the approximate solution x m obtained from the right-preconditioned 
GMRES algorithm with M~ 1 -inner products minimizes the residual M~ l -norm ||6— 
over the same affine subspace. 


2.4 Equivalence of the algorithms 

We can show that both left and right symmetric preconditioning are mathematically equiv- 
alent to split preconditioning. In the latter case, M must be factored into M = LL T and we 
solve 

L~ l AL~ T u — L~ l b, x = L~ T u. (14) 

Denoting by B the preconditioned matrix B — L~ 1 AL~ T , the GMRES procedure applied to 
the above system for the u variable, minimizes the residual norm, 

\\L-'(b-AL-Tu)\\ 2 

over all vectors u in the space u 0 -(- KG) with 

K m ] = span{r 0 , Br 0 , . . . , B™- 1 ^} 

in which 

r 0 = L~ l (b— AL~ t u q ) — L~ 1 (b— Ax 0 ) = L~ 1 r 0 . 

Note that the variables u and x are related by x = L~ T u. As a result, this procedure 
minimizes 

\\L~'(b-Ax)\\ 2 (15) 

over all x in the space x 0 -f KG) with 

KG) = span{L~ T r 0 , L~ T Br 0 , . . . , L~ T B m ~ l r Q }. 

We now make the following observation. For any k > 0 we have, 

L~ T B k h 0 = L- T B k L~ 1 r 0 = (M~ l A) k z 0 
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where z 0 = M l r 0 . Indeed, this can be easily proved by induction. Hence, the space Kffl is 
identical with the space 


K m ] = s Pan{z 0 , M~ 1 Az 0 , . . . , {M~ l A) m ~ l z 0 ) 


which is nothing but (13). In noting that 

II l-'(b - Ax) ||, = ||6 - Ax || M -, = \\M~'(b - Ax) || M (16) 

we have proved the following result. 


Theorem 2.2 Let M = LL T . Then the approximate solution obtained by GMRES applied 
to the split preconditioned system ( 1 J) is identical with that obtained from the GMRES 
algorithm for the left preconditioned system (2) using the M -inner product. 

Again, the same statement can be made about right-preconditioning. All that must be 
noticed is that the same minimization (16) is taking place, and that the minimization is over 
the same subspace in each of the left, right, and split preconditioning options [7, Sec. 9.3.4]. 
We emphasize in particular that it is the split preconditioned residual that is minimized in 
all three algorithms. 


3 Truncated iterative methods 

Truncated iterative methods are an alternative to restarting, when the number of steps re- 
quired for convergence is large and the computation and storage of the Krylov basis becomes 
excessive. When A is exactly symmetric, a three-term recurrence governs the vectors in the 
Arnoldi process, and it is only necessary to orthogonalize the current Arnoldi vector against 
the previous two vectors. If A is nearly symmetric, an incomplete orthogonalization against 
a small number of previous vectors may be advantageous over restarted methods. The ad- 
vantage here may offset the cost of maintaining the extra set of vectors to maintain the 
initial degree of symmetry. The incomplete Arnoldi procedure outlined below stores only 
the previous k Arnoldi vectors, and orthogonalizes the new vectors against them. It differs 
from the full Arnoldi procedure only in Line 4, which would normally be a loop from 1 to j. 
It can be considered to be the full Arnoldi procedure when k is set to infinity. 


Algorithm 3.1 Incomplete Arnoldi Procedure 

1. Choose a vector v 1 of norm 1. 

2. For j — 1,2, ... , m do 

3. w — Avj 

4. For i = max{ 1, j — k + 1 } , . . . , j do 

5. h tJ = (w,v t ) 
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w = w — h tJ v t 
EndDo 

hj+ij = lkl| 2 
K hj+ij — 0 then Stop. 
v j+i = u >/h j+lj 
EndDo 

The truncated version of GMRES uses this incomplete Arnoldi procedure and is called 
Quasi-GMRES [3]. The practical implementation of this algorithm allows the solution to be 
updated at each iteration, and is thus called a ‘direct’ version, or DQGMRES [8]. 

To suggest that truncated iterative methods may be effective in cases of near symmetry, 
we study the asymptotic behavior of the iterates of DQGMRES as the coefficient matrix A 
varies from nonsymmetry to (skew) symmetry. We first decompose A as 


6 . 

7. 

8 . 

9. 

10 . 
11 . 


A = S + B 


in which S is symmetric or skew symmetric, and set 


* = 11 * 11 * 


We will first establish asymptotic relations among the variables in the incomplete and full 
Arnoldi procedures. Then we will apply the incomplete procedure to A , and the full proce- 
dure to S, using the superscripts I and F to distinguish between the variables appearing in 
the two procedures. (Note that since S is (skew) symmetric, the full procedure on S is the 
same as the incomplete procedure with k > 2.) 

Moreover, if we denote the degree of the minimal polynomial of v ^ with respect to S by 
z/, then h F +l = 0 and h F +1 . ^ 0 for 1 < j < v. In the proof of the following lemma, we 
also use v! and v F to denote the vectors w 1 and w F obtained at the end of Line 7 in the 

3 3 

incomplete and complete Arnoldi procedures. 

Lemma 3.1 Assume the truncation parameter k > 2. If — v F + 0(e), then 


Hj = h F + 0(e ) , vj = v F + 0(e) 
where 1 < j < i/ and max{ 1 , j — k + 1} < i < j + 1 . 

Proof. The proof is by induction on the index j. By Lines 5 and 6 of the Arnoldi procedure, 
Mi = v* 2 = Av* - h^v* , ^ = 11^112, 

we have 

h{ 1 = (Sv F ,v F ) + 0(e) = h F 1 +0(e), 


V 2 = Sv F - h F x v F + 0(e) = v F + 0(e ) , 

Mi ~ Pf II2 + ^( £ ) — ^21 + O(e) 
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and hence the lemma holds for j — 1. Assume that the lemma has been proved for j < j 0 < v. 
On that hypothesis, we prove it for j — j 0 . By Line 10 of the Arnoldi procedure, 


v! = v 1 jh 1 ■ 1 

Jo Jo' Jo iJO 1 


which yields that 


v. — 

Jo 


+ 0(e) 

h F . , + 0(e) 

JO JO-1 v 7 


Jo 


h F . , 

Jo ,J 0 - 1 


+ 0(e) = v F + 0(e) 


by the induction hypothesis. Therefore 

w 1 — Av 1 — Sv F A 0(e) — w 1 ' A 0(e) 

Jo JO v 7 v 7 

for the w 1 and w F in Line 3 of the Arnoldi procedures. Using another induction on the index 
i in Lines 5 and 6, and the induction hypothesis on j and, in the mean time, noting that 
h F = 0 for 1 < i < in — 2, we have 

h' =h F . + 0(e) 

iJO Oo v 7 


for max{l,j 0 — k A 1} < * < j 0 and 


5 lo+l = ^+1 + • 


From the last equation, 


A +1J0 = P' 0+1 I| 2 = ll«Lills + 0(e) = + 0(e) 


J0 + 1 


Jo 4- 1, jo 


and then the induction step is complete. QED. 

We now turn to the DQGMRES algorithm. Consider the linear system 

Sx = b 


and denote by x G and x® the approximate solutions by the GMRES and DQGMRES algo- 
rithms, respectively. Let /i be the degree of the minimal polynomial of the vector b — Sx 0 
with respect to S. A result of the lemma can be stated as follows. 

Theorem 3.1 Given the same initial guess x 0 to GMRES and DQGMRES with k > 2, then 
at any given step m with 1 < m < fi, 


x® = x G A 0(e ) . 

m m v ' 

Proof. By the definitions of DQGMRES and GMRES, we have 

X Q =x 0 + /3QV‘ ((ti'Y H 1 ) * (h 1 ) T e, 

m u'r- m \\ mj ml \ m ) 1 
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and 



where (3® = \\b — Ax 0 || 2 and (3 G = ||6 — 5a: 0 [| 2 . Since 

v l = Jq( 6 ~ Ax o) = ^o) + 0(e) = v[ + 0(e) , 

we have by the lemma, 

H' =i)F + 0(e) , V’ = V F + 0(e) 
and therefore the desired equation holds. QED. 

If we let x A be the exact solution to Ax = b and x s be the exact solution to Sx = 6, then 
it is obvious that x A = + 0(e). Since, on the other hand, x G — x s we immediately have 

the following corollary. 

Corollary 3.1 For any initial guess x 0 and any k >2, 

= x A + 0(e) . 

The corollary suggests that we may use DQGMRES with small k when A is nearly 
symmetric or nearly skew symmetric. 


4 Symmetric preconditioning in Bi-CG 

The Bi-CG algorithm is based on Lanczos biorthogonalization. Both left-symmetric and 
right-symmetric preconditioning are relatively straightforward, and no extra vectors are re- 
quired. For reference, Algorithm 4.1 gives the right-preconditioned Bi-CG algorithm with 
preconditioner M. The symmetric right-preconditioned Bi-CG algorithm (right-preconditioned 
Bi-CG using M ~ l -inner products) is developed immediately afterward. 


Algorithm 4.1 Right-preconditioned Bi-CG 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8 . 

9. 

10 . 
11. 


Compute r 0 = b — Ax 0 . Choose r* such that (r 0 ,rj) ^ 0. 
Set p 0 = r 0 , pj = r* 

For j = 0, 1, • ■ • , until convergence Do: 


= {r j ,r*)l(AM~ x p v p’’) 

x j + i = x j + 

r j+i = r j ~ “jAM-'pj 

r* — r* — 0:M~ T A 1 p* 

Pj = ( r j + l ! r ’ +1 )/( r jl 

P, + 1 = ^ + 1 + PjPj 

P* +1 = '•*+, + P,P * 

End Do 
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Note in Line 7 of the above algorithm that the preconditioned coefficient matrix of the dual 
system is (AM~ l ) T — M~ T A T , i.e., the dual residual r* is the residual of 

M~ T A T x* = 6*, 

which is a left - preconditioned version of some linear system with A T . 

To develop the symmetric right-preconditioned Bi-CG, M” 1 -inner products are used in 
Algorithm 4.1 above. However, the preconditioned coefficient matrix of the dual system 
must be the adjoint of AM~ l in the inner product. This is A 1 M~ 1 as shown by 

(AM~ l x,y) M -i = (M~ l AM~ l x, y) = (x, M~ 1 A T M~ 1 y) = (x, A T M~ 1 y) M -i. 

The dual system thus involves the coefficient matrix A T M~ l . Algorithm 4.2 gives the sym- 
metric right-preconditioned Bi-CG algorithm with preconditioner M. 


Algorithm 4.2 Right-preconditioned Bi-CG with M~ l -inner products 

1. Compute r 0 = b — Ax 0 . Choose r* such that (M -1 r 0 ,r*) ^ 0. 

2. Set p 0 — M~ 1 r 0 , p* = M~ 1 r J 

3. For j — 0, 1, • • • , untiJ convergence Do: 

4. a, = (M- l r v r-)/(Ap v p’) 

5. x j+l = x, + a jPj 

6. r i+1 =rj- ajApj 

7. r * = r*. — a^A T p* 

8. (3j = (M- l r 3 + u r* +l )l(M~ l r v r*) 

9. p j+1 = M~ 1 r J+l + p jPj 

10. p* +1 = + fop* 

11. EndDo 

Like GMRES, both left and right symmetric preconditioned versions of Bi-CG are equiv- 
alent to the split preconditioned version, and this can be shown by a change of variables. 
However, in both left and right symmetric preconditioned versions, the exact, rather than 
the split preconditioned residual is available. 

The unpreconditioned Bi-CG algorithm cannot have a serious breakdown if A is S PD and 
r* is chosen to be r 0 . This is because r* = r 3 and p* = p 3 for all j and the vectors Ap 3 , p* 
never become orthogonal. In fact, the cosine 

( a p } ’P' 3 ) = ( A Pj.V]) Up, II 
PpjI!IIp*II l|Pjlll|p*ll PpjII 

can be bounded below by the reciprocal of the condition number of A. 


14 



Similarly, in the symmetric right-preconditioned version of Bi-CG, if both A and M are 
SPD, and r* = r 0 , then r * = r 3 and p * — p 3 for all j , and 


(Ap^p*) 

’ 3 > cond-\A ) 


PPjIIIIpJII 

(A/ - 1 rj,r*) 

||M- 1 r J ||||r*|| 


> cond '(M). 


We measure the cosines rather than the quantities (y4pj,p*) and r*) because the 

p and r vectors have magnitudes going to 0 as the algorithms progress. Recall that in the 
case when (M~ x r 3 , r*) = 0 and r 3 = 0, we have a lucky breakdown. 

For the case of regular right- or left-preconditioning, or if r* ^ r 0 in the symmetrically 
preconditioned cases, then no such lower bounds as the above exist, and the algorithms are 
liable to break down. 

When A is near-symmetric, it is our hypothesis that the probability of breakdown is 
lower in the symmetrically preconditioned cases, and this will be shown by experiment in 
the next section. 
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5 Numerical Experiments 

Section 5.1 tests the idea of using symmetric preconditionings with truncated iterative meth- 
ods. Section 5.2 tests the breakdown behavior of symmetrically preconditioned Bi-CG. 


5.1 Truncated iterative methods 

To test the idea of using symmetric preconditionings with truncated iterative methods for 
nearly symmetric systems, we selected a standard fluid flow problem where the degree of 
symmetry in the matrices is parameterized by the Reynolds number. The flow problem 
is the two-dimensional square-lid driven cavity, and was discretized by the Galerkin Finite 
Element method. Rectangular elements were used, with biquadratic basis functions for 
velocities, and linear discontinuous basis functions for pressure. We considered a segregated 
solution method for the Navier-Stokes equations, where the velocity and pressure variables 
are solved separately; the matrices arising from a fully-coupled solution method are otherwise 
indefinite. In particular, we considered the expression of the conservation of momentum, 

Re(u • Vu) — — Vp+ V 2 u 

where u denotes the vector of velocity variables, p denotes the pressure variable, and Re is 
the Reynolds number. The boundary conditions for the driven cavity problem over the unit 
square are u — (1,0) 7 on the top edge of the square, and u = (0,0) T on the other three sides 
and the corners. The reference pressure specified at the bottom-left corner is 0. 

The matrices are the initial Jacobians at each Newton iteration, assuming a zero pressure 
distribution. For convenience, however, we chose the right-hand sides of the linear systems 
to be the vector of all ones. A mesh of 20 by 20 elements was used, leading to momentum 
equation matrices of order 3042 and having 91204 nonzero entries. The nodes corresponding 
to the boundaries were not assembled into the matrix, and the degrees of freedom were 
numbered element by element. For Reynolds number 0., the matrix is SPD, and is equal to 
the symmetric part of the matrices with nonzero Reynolds number. 

We generated matrices with Reynolds number less than 10, which gives rise to the nearly 
symmetric case. For Reynolds number 1., the degree of symmetry measured by 

\\A + AT\\ f 

has value 7.5102 x 10~ 4 and this measure increases linearly with the Reynolds number (at 
least up to Re=10). 

In the numerical experiments below, we show the number of matrix-vector products 
consumed by GMRES(/c) and DQGMRES(fc) to reduce the actual residual norm to less than 
10~ 6 of the original residual norm, with a zero initial guess. Several values of k are used. A 
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dagger (f) in the tables indicates that there was no convergence within 500 matrix-vector 
products. The incomplete Choleski factorization IC(0) of the Re=0 problem was used as the 
preconditioner in all the problems. 

For comparison, we first show in Table 1, the results using standard right-preconditioning. 
Table 2 shows the results using right-preconditioning with M~ x inner products, or equiva- 
lently, split preconditioning. In the latter case, care was taken to assure that GMRES did 
not stop before the actual residual norm was within twice the tolerance. For DQGMRES, 
since an accurate residual norm estimate is not available within the algorithm, the exact 
residual norm was computed and used for the stopping criterion for the purpose of this com- 
parison. The right-preconditioned methods have a slight advantage in this comparison (by 
as many as 20 Mat-Vec’s), since they directly minimize the actual residual norm, whereas 
the symmetrically preconditioned methods minimize a preconditioned residual norm. 


Re. 

GMRES(fc) 

DQGMRES(fc) 

5 

10 

oo 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0 

232 

129 

59 

76 

220 

105 

72 

92 

t 

160 

83 

75 

1 

218 

126 

69 

76 

276 

131 

81 

94 

t 

97 

90 

79 

2 

233 

126 

70 

78 

391 

258 

82 

95 

t 

98 

94 

78 

3 

208 

126 

71 

87 

346 

232 

85 

95 

t 

99 

97 

79 

4 

214 

128 

72 

94 

345 

t 

88 

95 

477 

108 

98 

86 

5 

210 

128 


192 

394 

t 

91 

95 

326 

128 

99 

90 

6 

214 



446 

361 

t 

94 

97 

258 

197 

100 

94 

7 

215 



t 

345 

t 

97 

99 

239 

229 

101 

96 


Table 1: Mat-Vec’s for convergence for right-preconditioned methods. 


Re. 

GMRES (k) 

DQGMRES(fc) 

5 

10 

oo 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0 

243 

119 

57 

58 

58 

58 

58 

58 

58 

58 

58 

58 

1 

243 

119 

67 

75 

74 

74 

75 

74 

74 

74 

75 

75 

2 

244 

120 

68 

78 

78 

78 

79 

78 

78 

78 

78 

78 

3 

244 

121 

69 

88 

87 

87 

87 

87 

86 

86 

87 

87 

4 

244 

122 

70 

108 

95 

95 

95 

93 

91 

91 

93 

95 

5 

244 

126 

70 

t 

105 

103 

105 

100 

97 

96 

101 


6 

244 

127 

71 

t 

118 

111 

119 

108 

101 

101 

110 

117 

7 

243 

128 

71 

t 

131 

121 

139 

117 

104 

105 

121 

139 


Table 2: Mat-Vec’s for convergence for symmetric right-preconditioned methods. 


The results in Table 1 show the irregular performance of DQGMRES(fc) for these small 
values of k when the preconditioned system is not symmetric. The performance is entirely 
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regular in Table 2, where the preconditioned system is near symmetric. For Reynolds num- 
bers up to 3, the systems are sufficiently symmetric so that DQGMRES(2) behaves the same 
as DQGMRES with much larger k. The performance remains regular until beyond Reynolds 
number 7, when the number of steps to convergence begins to become irregular, like in the 
right-preconditioned case. 

GMRES with either right or symmetric preconditioning does not show any marked dif- 
ference in performance; apparently the symmetry of the preconditioned system is not as 
essential here for this problem. However, the results do show that DQGMRES (k) with small 
values of k may perform as well, in terms of number of steps, as GMRES(fc) with large values 
of k , particularly if near-symmetry is preserved. Since the former is much more efficient, the 
combination of preserving symmetry and truncated iterative methods may result in a much 
more economical method, as well as the more regular behavior shown above. 

We also performed the same experiments with orthogonal projection methods, namely 
the Full Orthogonalization Method (FOM) and its truncated variant, the Direct Incomplete 
Orthogonalization Method (DIOM) [7]. The results were very similar to the results above, 
and are not shown here. Indeed, the development of the algorithms and the theory above is 
identical for these methods. 

For interest, we also performed tests where an ILU(O) preconditioner was constructed for 
each matrix and compared right and split preconditioning. For the near-symmetric systems 
here, there was very little difference in these results compared to using IC(0) constructed 
from the Re=0 case for all the matrices. Thus the deterioration in performance as the 
Reynolds number increases is not entirely due to a relatively less accurate preconditioner, 
but is more due to the increased nonsymmetry and non-normality of the matrices. Al- 
though the eigenvalues of the preconditioned matrices are identical, their eigenvectors and 
hence their degree of non-normality may change completely. Unfortunately, it is difficult to 
quantitatively relate non-normality and convergence. 


5.2 Breakdown behavior of Bi-CG 


To test the breakdown behavior of Bi-CG, MATLAB was used to generate random matrices of 
order 300 with approximately 50 percent normally distributed nonzero entries. The matrices 
were adjusted so that 


A + A T 


(°rmn +10 5 )/ + £ 


A- A T 


i.e., the symmetric part was shifted so that the lowest eigenvalue was 10 -5 and then £ times 
the skew-symmetric part was added back. The parameter £ was altered to get varying degrees 
of nonsymmetry. 

For each £ that we tested, 100 matrices were generated, and the smallest value of the 
cosines corresponding to the denominators in the algorithms were recorded. In the right- 
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preconditioned case, we recorded the minimum of 

(AM~ l pj,p*) (r v r*) 

— and — 

||4M- 1 p J ||||p*|| ||r J ||||r*|| 

for ail j, and for the symmetric right-preconditioned case, we recorded the minimum of 

(Api,p*) (M -1 r ? , r*) 

P pjipjil Und l|M- 1 r i ||||r*|| 

for all j. The relative residual norm reduction was 10~ 9 when the iterations were stopped. 
The initial guesses were 0, and r* was set to r 0 . IC(0) of the symmetric part was used as 
the preconditioner. 

Table 3 shows the frequencies of the size of minimum cosines for the right-preconditioned 
(first row of each pair of rows) and the symmetrically-preconditioned cases (second row of 
each pair of rows). For example, all 100 minimum cosines were between 10 -3 and 3 x 10~ 3 in 
the symmetrically-preconditioned case. The average number of Bi-CG steps and the average 
minimum cosine is also shown. The last column, labeled ‘better’, shows the number of times 
that the minimum cosine was higher in the improved algorithm. 

The Table shows that the right-preconditioned algorithm can produce much smaller 
cosines, indicating a greater probability for breakdown. The difference between the algo- 
rithms is less as the degree of nonsymmetry is increased. For e = 0.1, there is almost no 
difference in the breakdown behavior of the algorithms. The Table shows that the number of 
Bi-CG steps is not significantly reduced in the new algorithm, nor is the average minimum 
cosine of the modified algorithm significantly increased. It is the probability that a small 
cosine is not encountered that is better. 

It is important to note that this behavior only applies when r* is set to r 0 . When r* is 
chosen randomly, there is no gain in the symmetrically-preconditioned algorithm, as shown 
in Table 4. 

Table 5 shows the number of steps and the minimum cosines for the two algorithms 
applied to the driven cavity problem described in Section 5.1 above. Figure 1 shows a plot of 
the minimum cosines as the two algorithms progress for the Re = 1 problem. Note that the 
minimum cosines are higher and much smoother in the symmetrically-preconditioned case. 
In the Re = 7 problem, the cosines are still higher, but the smoothness is lost. 


6 Conclusions 

When solving linear systems with matrices that are very close to being symmetric, this paper 
has shown that it is possible to improve upon the standard practice of using a (nonsymmetric) 
preconditioner for that matrix along with a left- or right-preconditioned iterative method. 
The original degree of symmetry may be maintained by using a symmetric preconditioner 
and an alternate inner product (or split preconditioning, if appropriate). By combining this 
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e 

\\\A+AT \\ F ) 

steps 

3e-6 

le-5 

le-5 

3e-5 

3e-5 
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le-3 
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3e-3 
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1 

82 

B 

H 
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92 
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29.27 
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0 

3 

2 


29 

32 

20 

3 

7.25 
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1 

B 

1 

6 

88 

2 
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77 
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3 

8 


36 

31 
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3 

B 

18 

39 

26 

5 
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69 
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II 
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Table 3: Frequencies of minimum cosines for right-preconditioned (first row of each pair of 
rows) and symmetrically-preconditioned (second row of each pair of rows) Bi-CG. 


e 

steps 

3e-6 

le-5 

3e-5 

le-4 

3e-4 

le-3 
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3e-2 
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better 
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3e-5 
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Table 4: Frequencies of minimum cosines when r* is chosen randomly. 
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Re. 

Bi-CG steps 

min cosines 

right 

symm 

right 

symm 

0 

70 

62 

1.52e-4 

1.45e-l 

1 

71 

68 

1.08e-4 

6.73e-3 

2 

74 

72 

2.44e-4 

5.12e-4 

3 

73 

72 

2.02e-4 

9.07e-3 

4 

77 

72 

1.93e-5 

6.52e-3 

5 

80 

75 

5.54e-5 

5.19e-4 

6 

80 

78 

1.91e-4 

4.30e-5 

7 

80 

80 

1.87e-4 

1.02e-3 


Table 5: Steps and minimum cosines for the driven cavity problem. 



Figure 1: Minimum cosines in right-preconditioned Bi-CG (solid line) and symmetrically- 
preconditioned Bi-CG (dashed line) for the Re = 1 problem. 
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idea with truncated iterative methods, solution procedures that converge more quickly and 
require less storage are developed. The truncated methods also seem to become more robust 
with the truncation parameter k when near-symmetry is maintained. The Bi-CG algorithm 
also seems to be more robust with respect to serious breakdown when near-symmetry is 
maintained. 
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